Abstract: Plagiarism is a growing problem in academia. Academics always use plagiarism detection tools to find similar source-code files. Once similar files are detected, the academic investigates the process which involves identifying the similar source-code fragments within them. These source code fragments could be used as evidence for proving plagiarism. The tool implements a new approach for investigating the similarity between source-code files with a view to gathering evidence for proving plagiarism. Graphical evidence is presented that allows for the investigation of source-code fragments with regards to their contribution toward evidence for proving plagiarism. The graphical evidence indicates the relative importance of the given source-code fragments across files in a corpus. This is done by using the Latent Semantic Analysis information retrieval technique to detect how important they are within the specific files under investigation in relation to other files in the corpus.

Keywords: Java Corpus, LSA, Cosine similarity.